Learning with Locally Linear Feature Regularization

نویسندگان

  • Ted Sandler
  • John Blitzer
  • Lyle Ungar
چکیده

Machine learning models are successfully being used for problems in language, vision, and biology that have millions or tens of millions of features. A common approach to alleviating the complexity of high dimensional feature spaces is to penalize the L1 or L2 norm of the parameter vector. We may be able to design more effective regularizers, though, if we possess external information about which features should behave similarly. For example, word co-occurrence statistics or thesauri such as Wordnet can indicate similarities which are useful for predicting the topic or sentiment of a document, and biological databases of gene pathways give similarities of genes which can predict disease based on gene expression levels. We present a simple framework in which similarities between features are encoded as a graph on features and a regression model is learned whose feature coefficients are similar for neighboring nodes. Our regularization criterion is closely related to locally linear embedding, a method for learning low dimensional embeddings of unlabeled, highdimensional data [1]. Because of this, we name it locally linear feature regularization (LLFR).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection via Joint Embedding Learning and Sparse Regression

The problem of feature selection has aroused considerable research interests in the past few years. Traditional learning based feature selection methods separate embedding learning and feature ranking. In this paper, we introduce a novel unsupervised feature selection approach via Joint Embedding Learning and Sparse Regression (JELSR). Instead of simply employing the graph laplacian for embeddi...

متن کامل

Locally Non-Linear Learning via Feature Induction and Structured Regularization in Statistical Machine Translation

Linear models, which support efficient learning and inference, are the workhorses of statistical machine translation; however, linear decision rules are less attractive from a modeling perspective. The combination of a simple learning technique and such a simple model means the overall learning process may ignore useful non-linear information present in the feature set. This places significant ...

متن کامل

Locally Non-Linear Learning for Statistical Machine Translation via Discretization and Structured Regularization

Linear models, which support efficient learning and inference, are the workhorses of statistical machine translation; however, linear decision rules are less attractive from a modeling perspective. In this work, we introduce a technique for learning arbitrary, rule-local, nonlinear feature transforms that improve model expressivity, but do not sacrifice the efficient inference and learning asso...

متن کامل

Short term load forecast by using Locally Linear Embedding manifold learning and a hybrid RBF-Fuzzy network

The aim of the short term load forecasting is to forecast the electric power load for unit commitment, evaluating the reliability of the system, economic dispatch, and so on. Short term load forecasting obviously plays an important role in traditional non-cooperative power systems. Moreover, in a restructured power system a generator company (GENCO) should predict the system demand and its corr...

متن کامل

Regularized Learning with Feature Networks

REGULARIZED LEARNING WITH FEATURE NETWORKS S. Ted Sandler Lyle H. Ungar In this thesis, we present Regularized Learning with Feature Networks (RLFN), an approach for regularizing the feature weights of a regression model when one has prior beliefs about which regression coefficients are similar in value. These similarities are specified as a feature network whose semantic interpretation is that...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008